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Detrended fluctuation analysis is used to investigate correlations between the monthly average 
of the maximum daily temperatures for different locations in the continental US and the different 
climates these locations have. When we plot the scaling exponents obtained from the detrended fluc- 
tuation analysis versus the standard deviation of the temperature fluctuations we observe crowding 
of data points belonging to the same climates. Thus, we conclude that by observing the long- 
time trends in the fluctuations of temperature it would be possible to distinguish between different 
climates. 
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; I. INTRODUCTION 

Ever since it was first realized that humanity is ca- 
pable of changing its natural surroundings more rapidly 
' than nature can repair it, model-based predictions about 
climate change has been one of the main research areas 
in science. While many climatologists firmly believe that 
there is strong evidence for our interference in nature in 
general and in climate in particular, a few others disagree. 

Before we can even attempt to settle this debate, there 
are quite a few questions that we need to answer about 
the climate of the past. Nature provides us with different 
forms of data which we can convert to climatic data. For 
datasets reflecting the near past, it is rather simple to ob- 

' tain information about the climate as we can always rely 
on the recorded history. For example, tree ring history 
from Scandinavia indicates that there was a period of 
high temperatures between 9th and 13th centuries, called 
the "Medieval Warm Period" I*]. The tree ring temper- 

' ature history for this period agrees with the glacier data, 
and has been supported by the historical records of Norse 

\ seafaring and colonization around the North Atlantic at 
the end of the 9th century. It is known that during this 
time the warmer climate helped in producing greater har- 
vests in Iceland and parts of Greenland 2] which in turn 
helped the colonies in Greenland. But when we go to 
older times, it becomes more and more difficult to relate 
the paleoclimatic proxies to the climate of a certain re- 

\ gion. From paleoclimatic proxies, like tree ring indices 
or stacked oxygen-isotope curves derived from deep sea 
cores, we normally obtain fluctuations of the local tem- 
peratures rather than their absolute values. Therefore, it 
is difficult to construct a method to characterize different 

, climates based solely on the fluctuations of temperature 
values. 

The weather forecast to first approximation, is a rather 
simple issue. A cold day is usually followed by a cold day, 
and a warm day is usually followed by a warm day. On a 
larger scale, a colder week is usually followed by a warmer 
week which corresponds to the average duration of the 
general weather regimes. But as the longer timescales 
are governed by different processes like circulation pat- 



terns and sometimes even influenced by trends like global 
warming, defining long-term correlations becomes more 
difhcult. 

In order to separate the trends and the correlations we 
need to eliminate the trends in our temperature data. 
Several methods are used effectively for this purpose: 
rescaled range analysis (R/S) wavelet tech- 

niques ( WT) 7, 8] and detrended fluctuation analysis 

(DFA)iiiig.' 

Analysis of the temperature fluctuations over a pe- 
riod of decades on different parts of the globe has al- 
ready showed the effectiveness of the application of de- 
trended fluctuation analysis to characterize the persis- 
tence of weather and climate regimes. DFA and WT 
have been applied to study temperature and precipita- 
tion correlations in different climatic zones on the con- 
tinents and also in the sea surface temperature of the 
oceans. The recent results show that the temperatures 
are long range power-law correlated. The long-term per- 
sistence of the temperatures can be characterized by an 
auto-correlation function C (n) of temperature variations 
where n is the time between the observations. The auto- 
correlation function decays as C(n) ~ n~'^ . Even though 
there is some disagreement on the value of the exponent 
7, the fact that the persistence of the temperatures can be 
characterized by this auto-correlation function is firmly 
established. Different groups have used R/S, DFA and 
WT analysis and have shown that this exponent 7 has 
roughly the same value 7 ^ 0.7 for the continental sta- 
tions 1, i, ES m El 111 [II m. The exponent 7 is 
found to be roughly 0.4 for island stations [III and 
sea surface temperature on the oceans [Til h6j. This 
method has also been applied to the temperature pre- 
dictions of coup led atmosphere-ocean general circulation 
models 0, 0, [23, [^ but there is disagreement on 
the actual value of the exponent 7. On one side it is ar- 
gued that the exponent does not change with the distance 
from the oceans 0,^3 and is roughly 7 ~ 0.7. On the 
other side it is said that the scaling exponent is roughly 
1 over the oceans, roughly 0.5 over the inner continents 
and about 0.65 in transition regions 19, 20, .21.]. 

Previous work in this area also shows that there is 
a slight variation in the scaling exponent between the 
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low-elevation, mountain, continental and maritime sta- 
tions [3 , ITsI l . Even though these variations are between 
7 ~ 0.5 and 7 ~ 0.7 the fact that they show a correlation 
with location and elevation indicates that a relationship 
between the statistical nature of the temperature fluctu- 
ations and the climate can be established. 



with 

fc = 0,l,2,...,(--l). (5) 

n 

If the temperature fluctuations were uncorrelated (i.e. 
white noise) we would expect 



II. METHOD 

Daily temperature data have a non-stationary nature 
due to seasonal trends. To remove this seasonal trend ef- 
fect, the mean temperature for each day over all the years 
in the data set is determined. Subsequently, mean daily 
maximum temperature was obtained by calculating the 
fluctuation from the mean daily maximum temperature, 

AT, ^ T,- <T,> (1) 

where < > is the mean daily maximum temper- 
ature. Similarly we can also use the mean daily aver- 
age temperature or the mean daily minimum tempera- 
ture, and the use of average or minimum temperature 
instead of maximum temperatures does not change the 
outcome of the analysis |l5| . To remove the remaining 
linear trends (the average temperature for some years can 
be higher or lower than the average temperature of the 
time series for that location as a result of atmospheric 
processes) , we applied DFA 9] , which essentially enables 
us to investigate long-term correlations in the data by 
getting rid of the trends. 

The standard calculation of the autocorrelation func- 
tion is hindered by the noise and nonstationarity in the 
data. Instead of calculating C(n) directly and reducing 
the noise in the time series, a running sum of the tem- 
perature fluctuations is considered, 

n 

2/(TO)=^Ar, (2) 

i=l 

where m = 1, , N. N is the length of the time series 
in question. The fluctuations in this sum are related to 
C(n) by 

Tinf - n^" (3) 

Next, the time series of the y{m) is divided into non- 
overlapping intervals of equal length n. In each interval, 
we fit y{m) to a straight line (a; (to) — km + d for each 
segment) and calculate the detrended square variability 

F{nY as 

(k+l)n 

P^=<- E (yM-^M)'> (4) 



F{n) ~ (6) 

with a = i . If a > i , we expect long-range power law 
correlations in the data for the range of values considered. 

Figure 1 shows F(n) for one US station as an arbitrary 
example (all other stations in our dataset, where daily 
temperature data is available, show similar behavior). 
For this station, (Millinockct, Maine) we have plotted the 
F{n) for both daily and monthly temperature data. To 
calculate the monthly temperature data we took the av- 
erage of the maximum daily temperatures for each month 
and used DFA on the monthly averages. Figure 1 shows 
that for a time period longer than about 60 days, both 
the daily and monthly temperature data give us the same 
scaling exponent a. This behavior has been observed be- 
fore for daily temperature data 0, . As the results of 
our analysis for daily and monthly data at all the sampled 
locations agrees for long time scales (n > 60 days) the 
results presented in this paper have been obtained from 
the analysis of monthly averages. Both the daily [n > 
60 days) and monthly averages span about two decades, 
indicating correlations of 30 years or more. We are aware 
of the fact that by taking the monthly averages we are ef- 
fectively losing information about the dataset, i.e. about 
correlations at a short time scale. However, as our main 
interest is in finding correlations in the longer time scales, 
in the light of the information obtained from Figure 1, 
we believe that this loss does not change the behavior of 
these data at longer time scales. In addition, much wider 
and reliable availability of monthly averages lead us to 
sacrifice short time scales for the sake of obtaining longer 
time series of data. 

In the present work we have investigated temperature 
fiuctuations for 129 weather stations in the continental 
US. The data has been obtained from the U.S. Histori- 
cal Climatology Network [S^. From the available data 
we have chosen the stations with the longest records. We 
did not include in our analysis datasets with data shorter 
than 75 years, and the longest dataset we had spanned 
110 years at several locations in the dataset (all ending 
in 1994). The data come from the coastal regions of Cal- 
ifornia (14 stations), Alabama (15 stations), Maine (12 
stations), New Mexico (24 stations). West Virginia (13 
stations), Michigan (17 stations), Montana (14 stations), 
and Arizona (20 stations). 

For these 129 continental U. S. stations the value of the 
exponent is found to be a = 0.60 ± 0.05. The individual 
error bars for the data points have been of the order of 
Aa ~ 0.01. Figure 2 gives a summary of the scaling ex- 
ponents obtained from these 129 stations. As we can see 
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FIG. 1: Detrended variability F{n) is plotted against a 
timescale. The circles represent daily maximum tempera- 
tures, the triangles represent monthly average temperatures. 
The slope of both of the curves gives a = 0.53 ± 0.03 (the 
slope for the daily maximum temperatures is calculated for 
the time period longer than 60 days). Data belongs to station, 
Millinocket, Maine. 
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FIG. 2: The histogram of the scaling exponents. The average 
from the 120 station gives a scaling exponent of a = 0.60 ± 
0.05. 



from this figure, consistent with the earlier observations 
11,0,0, 01 we obtain scaling exponents in the range of 
0.52 to 0.72. The scaling exponents for the eight different 
states are coastal California {a = 0.63 ± 0.06), Alabama 
(a = 0.56 ± 0.03), Maine (a = 0.60 ± 0.04), West Vir- 
ginia (a = 0.56 ± 0.02), New Mexico (a = 0.64 ± 0.04), 
Arizona (a = 0.61 ± 0.03), Michigan (a = 0.58 ± 0.02), 
and Montana (a = 0.58 ± 0.02) respectively. 

It has been suggested that the value of the exponent 
a depends on the geographic location (distance from the 
oceans and elevation 0|). To investigate this effect 
of location on the exponent a we first looked at the corre- 



FIG. 3: The dependence of the scaling exponent on the ele- 
vation of the weather station. 



lations between the elevation of the weather stations and 
the resulting exponent a in Figure 3. It is very difficult 
to observe any trends in the data with changing eleva- 
tion as most of the correlations are within one standard 
deviation. We might say that at elevations of about 200 
meters (the stations at 100-250 meters give an exponent 
a = 0.57 ± 0.03, the scaling exponent is slightly lower 
than the coastline (the stations at 0-100 meters give an 
exponent a — 0.62 ± 0.05). Above the elevation of 250 
meters the scaling exponent increases slightly with eleva- 
tion. 

Previous work suggested that the scaling exponent 
changed from 1 over the oceans to 0.5 over the inner con- 
tinents. The coastal regions appeared as transition re- 
gions corresponding to a scaling exponent of about 0.65 
[23. Figure 4 shows the relation between the distance 
between the station and nearest ocean, and the scaling 
exponent. We observe no change in the scaling expo- 
nent for the inner continental stations, which contradicts 
the previous work, but the distance of the observational 
station previously used, was about 2000 kms from any 
ocean, whereas in our data set, the largest distance be- 
tween any of the stations and the nearest ocean is about 
1600 kms. Therefore we cannot comment on the pos- 
sibility that the exponent changes to smaller values for 
even more "inner continental" stations. Latest work also 
agrees with our data in showin g no change with increas- 
ing distance from the coastline |l3l | . 

When we look at the scaling exponents and consider 
only the geographic locations of the stations it is almost 
impossible to distinguish these states from each other un- 
ambiguously. However a striking result can be observed 
if we plot the standard deviation of the temperature fluc- 
tuations versus the scaling exponent observed from that 
station. These standard deviations of the temperature 
fluctuations are calculated over the whole dataset where 
each point consists of a monthly average of daily maxi- 
mum temperatures. It is known that the standard devi- 
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FIG. 4; The dependence of the scahng exponent on the 
distance of the weather station from the nearest coastUne 
(ocean) . 



ation is a correct measure of fluctuations only if the un- 
derlying distribution is a pure Gaussian, and daily and 
monthly temperatures are known to show skewed distri- 
butions. The coefficient of skewness is defined as 0| 

1 ^ X -X 

Skewness(xi...XN) — —} [— ]^ (7) 

n ^-^ a 

where a — a{xi...XN) is the distribution's standard 
deviation. A positive value of skewness signifies a dis- 
tribution where an asymmetric tail extends out towards 
more positive x values and a negative value of skewness 
gives a distribution with an asymmetric tail extending 
out towards more negative x values. For the idealized 
case of a Gaussian distribution, the standard deviat ion 
of the coefficient of skewness is approximately y/6/N. In 
real life it is a good practice to account for skewnesses 
only if the coefficient of skewness obtained from the dis- 
tribution is many times larger than this value 12,^ 1. We 
have randomly chosen five of the stations to test for skew- 
ness. For all of these stations the period of the time 
series was 102 years. For comparison the standard devi- 
ation of the coefficient of skewness for a Gaussian distri- 
bution is ■\/6/1224 = 0.07. The coefficients of skewness 
for these stations are 0.03 (Brewton, AL), 0.09 (Berkeley, 
CA), -0.05 (Crow Agency, MT), -0.04 (State University, 
NM), and -0.06 (Glenville, WV). Figure 5 shows the fre- 
quency of the deviations from the monthly average tem- 
peratures for a sample station, Glenville, WV. As a side 
note, we have performed the same skewness analysis for 
these stations using the average monthly maximum tem- 
peratures instead of the deviation from the average maxi- 
mum temperatures. The skewness of this data was signif- 
icantly higher justifying the point that climatic data itself 
can have a strongly skewed distribution (-0.17 (Brewton, 
AL), -0.28 (Berkeley, CA), -0.13 (Crow Agency, MT), - 
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FIG. 5: The frequency of the deviation of the monthly tem- 
peratures from the monthly average temperatures. The fitting 
hue is to a Gaussian distribution. 



0.17 (State University, NM), and -0.20 (Glenville, WV)). 
Therefore we decided that the standard deviation of tem- 
perature fluctuations from the average temperatures can 
be used in statistical description of a dataset as long as 
the climate does not change during that given period of 
time. 

We are aware that state or country boundaries are not 
good indicators for identifying different regions, however, 
within many small states or countries, the climate does 
not change significantly. In cases where the climate is 
different in different parts of the state, we have either 
ignored the data from those stations, or just analyzed 
one part of the state as in the case of coastal Califor- 
nia. Therefore, we can safely assume that the climates 
in these states can be classified as Humid Subtropical - 
Mediterranean (Coastal California), Humid Subtropical 
- East Coasts (Alabama), Humid Continental - Hot Sum- 
mers - Year around precipitation (West Virginia) , Humid 
Continental - Mild Summers - Year around precipitation 
(Maine) and Dry/ Arid - Hot - Low Latitude desert (New 
Mexico) . 

Figures 6-10 give the scaling exponent for coastal Cali- 
fornia, Alabama, Maine, West Virginia, and New Mexico 
versus the standard deviation of the temperature fluctu- 
ations, respectively. In these figures we can identify that 
the scaling exponents crowd different regions of the graph 
indicating a possibility that these different climates can 
be distinguished from each other using the method de- 
scribed above. We must be clear about one point: If 
we have only the standard deviations of the tempera- 
ture fluctuations and the scaling exponents resulting from 
those distributions, obtaining clusters which would indi- 
cate different climates, is extremely difficult. However 
the question we are trying to answer is simpler: We know 
where the stations are located (on the standard deviation 
of temperature fluctuations versus the scaling exponents 
map) for Humid Subtropical - Mediterranean climate. 
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FIG. 6: The scaling exponents plotted against the standard 
deviation of the temperature fluctuations for coastal Califor- 
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FIG. 8: The scaling exponents plotted against the standard 
deviation of the temperature fluctuations for Maine. 
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FIG. 7: The scaling exponents plotted against the standard 
deviation of the temperature fluctuations for Alabama. 
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FIG. 9: The scaling exponents plotted against the standard 
deviation of the temperature fluctuations for West Virginia. 



Can we now identify an unknown station as having a 
Humid Subtropical - Mediterranean climate? To be able 
to answer this question we have used the support vector 
machine (SVM) algorithm for data classification. 

A data classification task normally uses training and 
testing data. Each instance in the training data set con- 
sists of one target value (in our case belonging to a spe- 
cific climate type) and several features (like the standard 
deviation of the temperature fluctuations and the scaling 
exponent). The aim of SVM is to produce a model 
which then predicts the target value of data instances in 
the testing set. In training set we have used 10 posi- 
tive target values (belonging to the climate class we are 
analyzing) and 20 negative target points (belonging to 
different climate classes). In the testing set, we supplied 
the algorithm with all of the stations in our data set and 
asked the program to identify the different regions. The 
performance of such an algorithm is usually quantified by 
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FIG. 10: The scaling exponents plotted against the standard 
deviation of the temperature fluctuations for New Mexico. 
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its accuracy during the test phase which mainly depends 
on the correct treatment of true positives (TP) and true 
negatives (TN). It is usually also important to distinguish 
between two types of errors: A false positive (FP) and 
a false negative (FN). Consequently, the performance of 
the prediction is better judged if we add two more quanti- 
fiers, sensitivity and specificity. The accuracy of the data 
classification is defined as the ratio between the number 
of correctly identified samples and the total number of 
samples: 

TP + TN 
^ TP + FP-,TN + FN 

The sensitivity is the ratio between the number of true 
positive predictions and the number of positive instances 
in the test set: 



Arizona is geographically similar to New Mexico in dis- 
tance from the coastline and elevation. In the analysis, 
we have treated Maine as our known climate, and to- 
gether with the other states, Montana and Michigan as 
our " unknowns" . Table 2 shows the results of this anal- 
ysis. As expected, accuracy, sensitivity and specificity 
dropped slightly when we considered significantly differ- 
ent geographic locations. However, we still have more 
than 95predicting the climate of Michigan and Montana 
based solely on this analysis. In the second part of this 
analysis, we have considered New Mexico as our known 
climate, and together with the other states in our dataset, 
Arizona as our "known" climate. The results show that 
all of the stations in Arizona are correctly identified as be- 
longing to Dry/ Arid - Hot - Low Latitude desert climate 
except for one station. Improvement in the statistics is 
caused by increasing the number of stations. 



TP 

sensitivity = j,p _^ (9) 

Finally, the specificity is defined as the ratio between 
the number of true negative predictions and the number 
of negative instances in the test set: 

TN 

specificity = ppj^p^ (10) 

For our initial analysis, we used only the five different 
climates mentioned above. The results are summarized 
in Table 1. Considering that we have a test set of 78 sta- 
tions the deviations from 100% for any of the accuracy, 
sensitivity, and specificity values are caused by at most 4 
stations identified either as false positives or false nega- 
tives. We believe that this small discrepancy is caused by 
microclimatic behavior for those stations. For example, 
the only false positive for Humid Subtropical - Mediter- 
ranean climate comes from the only coastal station in 
Alabama. 



To test this method even further, we have used SVM 
algorithm on the remaining 51 stations. Out of these 
51 stations, 31 were in Montana and Michigan (Humid 
Continental - Mild Summers - Year around precipitation 
climate like Maine) and 20 were in Arizona (Dry/ Arid 
- Hot - Low Latitude desert climate like New Mexico). 
Montana and Michigan are geographically different from 
Maine both in distance from the coastline and elevation. 



III. CONCLUSION 

In this study, the variability of the weather in differ- 
ent parts of the continental US, as an example of differ- 
ent climates, has been investigated. Our results suggest 
that different climates can be readily distinguished using 
the detrended fluctuation analysis method on the fluctu- 
ations of the maximum daily temperatures. Even though 
we have used state boundaries to define the climates, as 
long as a mild, subtropical, Mediterranean climate ex- 
ists in Coastal California, this method should be equally 
applicable to distinguish this climate from New Mexico 
where a dry, arid, hot desert climate is observed. 

The results presented here are preliminary, based on 
stations with known climates. The real challenge lays in 
the future ability of this method to be applied to paleo- 
climatic data to reveal structure over timescales not only 
of the order of decades but that of millions of years. This 
point stays speculative as no reliable monthly data exists 
beyond 218 years (to our knowledge). However, the 
fact that this data does not seem to scatter appreciably 
at longer timescales [l^ gives us hope about expanding 
the use of the method mentioned above. 
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